Improved smoothed analysis of the k-means method

نویسندگان

Bodo Manthey

Heiko Röglin

چکیده

The k-means method is a widely used clustering algorithm. One of its distinguished features is its speed in practice. Its worst-case running-time, however, is exponential, leaving a gap between practical and theoretical performance. Arthur and Vassilvitskii [3] aimed at closing this gap, and they proved a bound of poly(nk, σ−1) on the smoothed running-time of the k-means method, where n is the number of data points and σ is the standard deviation of the Gaussian perturbation. This bound, though better than the worst-case bound, is still much larger than the running-time observed in practice. We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in n √ k and σ−1. Second, we prove an upper bound of kkd · poly(n, σ−1), where d is the dimension of the data space. The polynomial is independent of k and d, and we obtain a polynomial bound for the expected running-time for k, d ∈ O( √ log n/ log log n). Finally, we show that k-means runs in smoothed polynomial time for one-dimensional instances.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

2 00 9 k - Means has Polynomial Smoothed Complexity

The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are u...

متن کامل

Combination of Transformed-means Clustering and Neural Networks for Short-Term Solar Radiation Forecasting

In order to provide an efficient conversion and utilization of solar power, solar radiation datashould be measured continuously and accurately over the long-term period. However, the measurement ofsolar radiation is not available to all countries in the world due to some technical and fiscal limitations. Hence,several studies were proposed in the literature to find mathematical and physical mod...

متن کامل

Numerical Investigation of Vertical and Horizontal Baffle Effects on Liquid Sloshing in a Rectangular Tank Using an Improved Incompressible Smoothed Particle Hydrodynamics Method

Liquid sloshing is a common phenomenon in the transporting of liquid tanks. Liquid waves lead to fluctuating forces on the tank wall. If these fluctuations are not predicted or controlled, they can lead to large forces and momentum. Baffles can control liquid sloshing fluctuations. One numerical method, widely used to model the liquid sloshing phenomena is Smoothed Particle Hydrodynamics (SPH)....

متن کامل

Worst-Case and Smoothed Analysis of the k-Means Method with Bregman Divergences

The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice despite its exponential worst-case running-time. To narrow the gap between theory and practice, k-means has been studied in the semi-random input model of smoothed analysis, which often leads to more realistic conclusions than mere worst-case analysis. For the case tha...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Improved smoothed analysis of the k-means method

نویسندگان

چکیده

منابع مشابه

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

2 00 9 k - Means has Polynomial Smoothed Complexity

Combination of Transformed-means Clustering and Neural Networks for Short-Term Solar Radiation Forecasting

Numerical Investigation of Vertical and Horizontal Baffle Effects on Liquid Sloshing in a Rectangular Tank Using an Improved Incompressible Smoothed Particle Hydrodynamics Method

Worst-Case and Smoothed Analysis of the k-Means Method with Bregman Divergences

عنوان ژورنال:

اشتراک گذاری